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Abstract 


We derive a simple lower bound for the multi-version coding problem formulated in HI. We also 
propose simple algorithms that almost match the lower bound derived. Another lower bound is proven 
for an extended version of the multi-version coding problem introduced in ID. 


I. Introduction 

We study the multi-version coding problem formulated by Wang and Cadambe flU. In this 
problem, there is a distributed storage system with n servers, and a client with v independent 
message versions. The informal description of the problem is as follows. Every time, the client 
uploads one version (starting with version 1) by connecting to these n servers. Because of network 
failures, a version may not reach all the servers. However, when a version is reached/received 
by a server, the server stores some information about that message version (not necessarily 
the whole message), and perhaps modifies the information already stored. For example, in the 
replication strategy, when a version reaches a sever, the server stores the whole version and 
deletes any version stored before. 

Let c, 1 < c < n be an integer. The multi-version coding problem requires that the client 
should be able to download a version i, I < i < v, by connecting to any set of c servers S, if 
version i is the latest version reached by all the servers in S. The objective of the problem is to 
minimize the worst-case storage cost per server, defined as the size of server’s storage divided 
by the size of message (assuming that all versions have the same size). 

By the above definition, the storage cost of the simple replication strategy is one. When 
c<v, a better strategy, as stated in [[II, is to use an (n, c) MDS code for each version. Using 
this approach, the worst-case storage cost is Interestingly, it was shown that the cost of ^ can 
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be slightly reduced for z; = 2, and z; = 3, to and respectively [HI. The authors of |[I1 
also proved a lower bound of 1 — (1 — i)" for the worst-case storage cost, hence concluded that 
when the number of versions v approaches infinity, the replication strategy is close to optimal. 
Their lower bound also indicates that for small values of v, MDS codes are almost optimal. 

In this work, we prove a new lower bound on the worst-case storage cost. Our lower bound 
shows that when v > c, the replication strategy is optimal. We propose two algorithms based on 
erasure codes that can achieve near optimal storage cost for any v < c. This answers an open 
question raised in [[Tl on designing codes for moderate values of v. 

II. Lower Bound 

Proposition 1. The worst-case storage cost of the multi-coding problem is lower bounded by 
min(l,^). 

Note that ^ ^ vc-{v-i) 

Proof: Suppose v < c, and n = c -t- 1. Assume that server z, z;-|-l<z<c-l-l were 
reached by all the v versions. Also, assume that server z, 1 < z < z;, were reached by all the v 
versions except version z. Let S'*, 1 < z < z;, be the subset of servers including all servers except 
z. Note that, for every 1 < z < z;, |S'j| = c, and the latest version reached by all server in Si is z. 
Therefore, we must be able to retrieve version z, 1 < z < z;, by connecting to Si. This implies 
that the set of all c-f 1 servers must contain information about all v versions. Hence, the storage 
cost per server must be at least in this setting. Note that, by partitioning the set of servers 
to parts of size c -I- 1, this argument is easily generalized to the case where c -t- l|n . ■ 

III. Simple Near-Optimal Multi-Version Coding Algorithms 

Following we informally describe two multi-version coding algorithms. The proposed algo¬ 
rithms assure that at each step of the process the storage cost per server does not exceed the 
maximum storage cost. Also the information stored for one version does not need to increase 
when other versions arrive. 

A. First Algorithm 

The first algorithm uses a (rz, c-|- 1) MDS code for versions 1 < z < z; — 1, and a (rz, c) MDS 
code for version v (the last version). Suppose the size of each version is B bits. Upon receiving 
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a version z,l<z<n — l,a server stores ^ bits of coded information for that version, and 
reduces the information stored for version i — 1 from ^ to ^ (if version i — 1 has received 
before). Every server that receives version v, that is the latest version, just stores ^ bits of coded 
information for it. Now, first note that, in the worst case, the total storage cost of a server is 
{v — 1)^ + -f , which is less than Second, if version i, l<i<u — lis the latest 

version reached by a set of c servers, then the total information about version i stored in those 
servers is at least (c — 1)^ + = B, where is due to the fact that at least one of those 

servers has not been reached by version i + 1. If version v is the latest version reached by the 
servers, then the total information of version v at the servers is clearly c - ^ = B. 


B. Second Algorithm 

The second algorithm slightly improves the storage cost of the first algorithm to 
which almost matches the lower bound proven. Here, we just explain how storage is assigned 
for each version on a server. Using coding we can easily guarantee that a version is retrievable 
from a set of servers as long as the sum of storages assigned to that version by the set of servers 
is at least B bits. 

In the second algorithm, upon receiving the first version, a server stores B bits of 

information. When another version is received, the server deletes — bits of information of the 
first version, and stores — bits of information of the version received. Now consider a set S of 
c servers. If the latest version reached by all servers in 5 is z > 1, then each server has bits 
of information of that version, so the latest version can be decoded. If the latest version is the 
first version, then the total information of the first version stored in all servers in S is at least 


1 ) 


B 


1)- = B, 
c 


where the term {v — 1)^ is due to the fact that versions 2,3,.. .u are not the latest versions 
reached, hence the servers that miss those versions have deleted — less bits of information from 
their first version for each missing version. 


IV. Extended Multi-Coding problem 

In the original multi-coding problem, the latest version reached by a set of c servers should be 
decodable. This can be relaxed, as explained in Q, by requiring the latest version or any later 
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version to be deeodable. In |I2|, it was shown that the storage eost of the extended multi-eoding 
problem is strietly less than that in the original problem. The following lower bound on the 
worst storage eost per server was proven in |l2l: 



storage eost > 


Note that the above lower bound does not depend on v. Here, we prove a lower bound that is 
an inereasing funetion of u. In partieular, we show that the storage eost of the extended multi-east 
problem is lower bounded by Then, we show that the bound is tight when c = vq + 1 

for some non-negative integer q. 

Proposition 2. The worst-case storage cost for the extended multi-coding problem is at least 


V 


C+D—1 * 


Proof: The set of versions reaehed by a server is ealled the profile of the server. To prove the 
proposition, we eonstruet m profiles, iteratively. Then, we eonsider a set of m servers eaeh with 
one of those profiles, and argue on the minimum amount of information those servers should 
have, eolleetively. In the following, we represent a profile with a binary veetor of size v, where 
a “1” in eoordinate i, 1 < i < v implies reeeption of version i. Note that a server with a “1” 
in eoordination i in its profile has not neeessarily stored any information about version i. A “0” 
in eoordinate i, however, indieates that version i has not been reeeived, therefore the server will 
have no information about version i. 

The eonstruetion of profiles is performed iteratively starting with profile pi = (1,1,1,...,!), 
that is the profile of a server that has reeeived all the versions. Let pi be the profile eonstrueted 
in the fth iteration. To eonstruet Pi+i, we initially set pi+i to pi. If the set of f -f 1 servers with 
profiles Pi, ... ,Pi,Pi+i have at least B bits of information about a version j, then we set he 
eoordinate j in veetor p^+i to zero. We repeat this proeess of nullifying eoordinates until the 
set of f -f 1 servers with profiles pi ... ,Pi+i do not have enough information (that is B bits of 
information) about any version. We terminate if pj+i is a zero veetor, and set m to i. 

First, we show that m < c — 1. By eontradietion, assume m > c. Then, there must be a 
eoordinate j whieh is equal to one in all the profiles pi,p 2 ,... ,Pm- This is a eontradietion, 
sinee, in that ease, the set of c servers with profiles pi,p 2 , ■ ■ ■ ,Pc have at least one eommon 
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version, hence they can collectively decode at least one version (that is, they must have enough 
information about at least one version). 

Next we show that, for any version u, the set of m servers with profiles Pi,P 2 , ■ ■ ■ ,Pm 
collectively have at least B — t bits of information, where t is the maximum storage cost per 
server. Fix any version u. Let 1 < j < m be the first iteration in the profile construction process 
where the coordinate corresponding to version u is set to zero. This implies that there is a 
profile p such that the set of j servers with profiles pi,.. .,pj-i,p have enough information 
about version u. Note that the maximum amount of information per server for version u is 
t. Therefore, the set of j — 1 servers with profiles pi,... ,Pj-i must collectively have at least 
B — t bits of information about version u. Since this holds for any version, the servers with 
profiles pi,... ,pm must have at least v{B — t) bits of information about all v versions. The 
maximum storage cost per server is t. Therefore, we must have < t, thus ^ 

hence t > . B. 

— C+V—1 

■ 

Suppose each server only stores information about the latest version received. Without loss of 
generality, suppose B = 1. Assume that the amount of storage assigned to the latest version is 
-f4t. Consider a set of c = + d servers, where g is a non-negative integer and 1 < d < v — 1. 

\ u \ 

Assume that each server has received at least one version. This this is a more general assumption 
compared to the problem’s assumption, which only considers the set of c servers that have at 
least one common version. Since each server has received at least one version, there must be at 
least g + 1 servers with identical latest versions. Each of those servers has assigned |4 t storage 
to their latest version. Therefore, the total amount of storage assigned to that version is 

When d = 1, that is when c = vq + 1, we get 

11 V 

PT ~ “ c + v-l 

For instance, when c = + 1, it is possible to get the optimal storage cost of |, which is 

almost 50% lower than the minimum storage cost achievable in the original multi-version coding 
problem. 
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We remark that, under the general assumption mentioned above, the storage eost of t4t is 

I y I 

optimal. The reason is as follows: Consider v groups of servers, eaeh group with servers in 
it. Note that the total number of servers in all groups is at least c. Suppose every server in group 
i, 1 < i < V has reeeived only version i. If the storage eost per server is less than pr, for any 
version i, the total information about version i stored by servers in all the v groups will be less 
than one. In this ease, no version ean be deeoded by the above set of n • > c servers. 

V. CONCLUSION 

Based on the first lower bound derived, the simple replieation strategy is optimal if the number 
of versions is more than c. For smaller number of version, there is a simple strategy based on 
MDS eodes that ean almost aehieve the lower bound derived. Our seeond lower bound improves 
the lower bound on the storage eost of the extended multi-version eoding problem proposed 
in O. It is also tight for many values of v. 
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